Introduction to R Programming
Exercises II
Now it’s your turn to write some R code!
The following includes a list of exercises that you can complete on your own.
We are going to use the palmerpenguins data set for the
tasks ahead!
Functions reference list
For reference, here is a list of some useful functions.
If you have trouble with any of these functions, try reading the
documentation with ?function_name
Remember: all these functions take the data first.
filter()- Subset rows using column values
mutate()- Create and modify delete columns
rename()- Rename columns
select()- Subset columns using their names and types
summarise();summarize()- Summarise each group to fewer rows
group_by();ungroup()- Group by one or more variables
arrange()- Arrange rows by column values
count();tally()- Count observations by group
distinct()- Subset distinct/unique rows
pull()- Extract a single column
ifelse()- useful for coding of binary variables
case_when()- useful for recoding (when
ifelseis not enough)
- useful for recoding (when
separate()- separate two variables by some separator
pivot_wider()- turn data into wide format
pivot_longer()- turn data into long format
Task 1
Load the tidyverse and janitor
packages.
If
tidyverseorjanitorpackages are not installed yet (it will say[package] not found). Your first task, is then to install them. Note: this might take you a little bit of time.
Task 2
Read in the already cleaned palmerpenguins data set
using
read_csv- the following URL: https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv
Assign the resulting data to penguins.
Then take a look a look at it using glimpse.
What kind of variables can you recognize?
Response:
Task 3
Only keep the variables: species, island
and sex.
Only keep variables 2 to 4.
Remove the column year.
Only include columns that contain “mm” in the variable name.
Task 4
Rename island to location.
Task 5
Filter the data so that species only includes
Chinstrap.
Filter the data so that species only includes
Chinstrap or Gentoo.
Filter the data so it includes only penguins that are
male and of the species Adelie.
Task 6
Create three new variables that calculates
bill_length_mm and bill_depth_mm and
flipper_length_mm from millimeter to centimeter.
Tip: divide the length value by 10.
Create a new variable called bill_depth_cat which has
two values:
- Everything above a bill depth of 18mm and 18mm itself is “high”
- Everything below a bill depth of 18mm is “low”
Create a new variable called species_short.
Adelieshould becomeAChinstrapshould becomeCGentooshould becomeG
Task 7
Calculate the average body_mass_g per
island.
If you haven’t done so already, try using the %>%
operator to do this.
Task 8
Use the pipe operator (%>%) to do all the operations
below.
- Filter the
penguinsdata so that it only includesChinstraporAdelie. - Rename
sextoobserved_sex - Only keep the variables
species,observed_sex,bill_length_mmandbill_depth_mm - Calculate the ratio between
bill_length_mmandbill_depth_mm - Sort the data by the highest ratio
Try to create the pipe step by step and execute code as you go to see if it works.
Once you are done, assign the data to new_penguins.
Calculate the average ratio by species and
sex, again using pipes.
Task 9
Count the number of penguins by island and species.
Task 10
Below is a dataset that needs some cleaning.
Use the skills that you have learned so far to turn the data into a tidy dataset.
animal_friends <- data.frame(
Names = c("Francis", "Catniss", "Theodor", "Eugenia"),
TheAnimals = c("Dog", "Cat", "Hamster", "Rabbit"),
Sex = c("m", "f", "m", "f"),
a_opterr = c("me", "me", "me", "me"),
`Age/Adopted/Condition` = c("8/2020/Very Good", "13/2019/Wild", "1/2021/Fair", "2/2020/Good")
) Start here:
If you are done, turn the final data into long format.